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Abstract 

The concept of homogeneity plays a critical role in statistics, both in its applications as well as its theory. 
Change point analysis is a statistical tool that aims to attain homogeneity within time series data. This is 
accomplished through partitioning the time series into a number of contiguous homogeneous segments. The 
applications of such techniques range from identifying chromosome alterations to solar flare detection. In this 
manuscript we present a general purpose search algorithm called cp3o that can be used to identify change 
points in multivariate time series. This new search procedure can be applied with a large class of goodness of fit 
measures. Additionally, a reduction in the computational time needed to identify change points is accomplish 
by means of probabilistic pruning. With mild assumptions about the goodness of fit measure this new search 
algorithm is shown to generate consistent estimates for both the number of change points and their locations, 
even when the number of change points increases with the time series length. 

A change point algorithm that incorporates the cp3o search algorithm and E-Statistics, e-cp3o, is also 
presented. The only distributional assumption that the e-cp3o procedure makes is that the absolute atli 
moment exists, for some a £ (0,2). Due to this mild restriction, the e-cp3o procedure can be applied to a 
majority of change point problems. Furthermore, even with such a mild restriction, the e-cp3o procedure 
has the ability to detect any type of distributional change within a time series. Simulation studies are used 
to compare the e-cp3o procedure to other parametric and nonparametric change point procedures, we we 
highlight applications of e-cp3o to climate and financial datasets. 
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1 Introduction 


The analysis of time ordered data, referred to as time series, has become a common practice in both academic 
and industrial settings. The applications of such analysis span may different fields, each with its own analytical 
tools. Such fields include network security (Blazek et al., 2001; Siris and Papagalou, 2006), fraud detection 
(Akoglu and Faloutsos, 2010; Fawcett and Provost, 1997), financial modeling (Andrcou and Ghysels, 2002; Dias 
and Embrechts, 2004), climate analysis (Wang et ah, 2013), astronomical observation (Friedman, 1996; Xic et al., 
2013), and many others. 

However, when analysis is performed it is generally assumed that the data adheres to some form of homogeneity. 
This could mean a range of things, depending upon the application area. Some common types of assumed 
homogeneity include: constant mean, constant variance, and strong or weak stationarity. Depending on the 
nature of these assumptions it may not be appropriate, or practical, to apply a given analytical procedure to 
many different types of time series. For instance, an algorithm that assumes weak stationarity would not be 
suitable for analyzing data that follows a Cauchy distribution, because of its infinite expectation. Furthermore 
many time series of real data can be seen, even through visual inspection, to violate such homogeneity conditions. 

Results obtained under such model misspecification can vary in their degree of inaccuracy (('hong et ah, 
1995). The resulting bias from such misspecification is one of the reasons for the current resurgence of change 
point analysis. Change point analysis attempts to partition a time series into homogeneous segments. Once 
again the definition of homogeneity will depend upon the application area. In this paper we will use a notion 
of homogeneity that is common in the statistical literature. We will say that a segment is homogeneous if all of 
its observations are identically distributed. Using this definition of homogeneity, change point analysis can be 
performed in a variety of ways. 

In this paper we consider the following formulation of the offline multiple change point problem. Let 
Z\, Zii ■ ■ ., Zt G R d be a length T sequence of independent d-dimensional time ordered observations. The 
dimension of our observations is arbitrary, but assumed to be fixed. Additionally, let Fq, F\, F %,..., be a 
(possibly infinite) sequence of distributional functions, such that F t ^ iq+i ■ It is also assumed that in the 
sequence of observations, there is at least one distributional change. Thus, there exists k(T) > 1 time indices 

0 = 7o,t < Tqr < • • • < T k (T),T < T k(T)+i,T = T, such that Zi Fj, for t^t < i < Tj. From this notation it is 
clear that the locations of change points T :h T depend upon the sample size. However, we will usually suppress this 
dependence and use the notation for simplicity. The challenge of multiple change point analysis is to provide a 
good estimate of both the number of change points, fc(T), as well as their respective locations, r 1; r 2 ,... Tk(T)- I n 
some cases it is also necessary to provide some information about the distributions Fg,... ,F k ( T y However, once 
a segmentation is provided it is usually straight-forward to obtain such information. 

A popular approach is to fit the observed data to a parametric model. In this setting a change point corresponds 
to a change in the monitored parameter(s). Earlier work in this area assumes Gaussian observations and proceeds 
to partition the data through the use of maximum likelihood (Maboudou-Tchao and Hawkins, 2013). More 
recently, extensions to other members of the Exponential family of distributions and beyond have been considered 
(( 'lien and Gupta, 2011). In general, all of these approaches rely on the existence of a likelihood function with 
an analytic expression. Once the likelihood function is known, analysis is reduced to finding a computationally 
efficient way to maximize the likelihood over a set of candidate parameter values. 

Parametric approaches however, rely heavily upon the assumption that the data behaves according to the 
predefined model. If this is not the case, then the degree of bias in the obtained results is usually unknown 
(Pitarakis, 2004). In practice, it is almost always difficult, if not impossible, to test for adherence to these 
assumptions. Under such settings, performing nonparametric analysis is a natural way to proceed (Brodsky 
and Darkhovsky, 1993). Since nonparametric approaches make much weaker assumptions than their parametric 
counterparts they can be used in a much wider variety of settings; for example, the analysis of internet traffic 
data, where there is no commonly accepted distributional model. Even though these methods do not directly 
impose a distributional model for the data, they do make their own types of assumptions (Zou et al., 2014). For 
instance, a common assumption is the existence of a density function, which then allows practitioners to perform 
maximum likelihood estimation by using estimated densities. However, estimation becomes inaccurate and time 
consuming when the dimension of the time series increases (Hastie et al., 2009). 

Performing multiple change point analysis can easily become computationally intractable. Usually the number 
of true change points is not known beforehand. However, even if such information were provided, finding the 
locations is not a simple task. For instance, if it is known that the time series contains k change points then 
there are 0(T k ) possible segmentations. Thus naive approaches to find the best segmentation quickly become 
impractical. More refined techniques must therefore be employed in order to obtain change point estimates in a 
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reasonable amount of time. 

Most existing procedures for performing retrospective multiple change point analysis can be classified as 
belonging to one of two groups. The first consists of search procedures which will return what are referred to as 
approximate solutions, while the second consist of those that produce exact solutions. As indicated by the name, 
the approximate procedures tend to produce suboptimal segmentations of the given time series. However, their 
benefit is that they tend to have provably much lower computational complexity than procedures that return 
optimal segmentations. 

Approximate search algorithms tend to rely heavily on a subroutine for finding a single change point. Esti¬ 
mates for multiple change point locations are produced by iteratively applying this subroutine. Such algorithms 
are commonly referred to as binary segmentation algorithms. In many cases it can be shown that binary seg¬ 
mentation algorithms have a complexity of 0{T\ogT). This type of approach to multiple change point analysis 
was introduced by Vostrikova (1981) and has since been adapted by many others. Such adaptations include the 
Circular Binary Segmentation approach of Olshen and Venkatraman (2004) as well as the E-Divisive approach of 
Matteson and James (2014). The Wild Binary Segmentation approach of Fryzlewicz et al. (2014) is a variation 
of binary segmentation that utilizes random intervals in an attempt to further reduce computational time. An 
extension of this approach to multivariate multiplicative time series called Sparsified Binary Segmentation has 
been produced by Cho and Fryzlewicz (2012). Each of these procedures have been shown to produce consistent 
estimates of both the number and locations of change points under a variety of model conditions. 

Exact search algorithms return segmentations that are optimal with respect to a prespecified goodness of 
fit measure, such as a log likelihood function. The naive approach of searching over all possible segmentations 
quickly becomes impractical for relatively small time series with a few change points. Therefore, in order to 
achieve a reasonable computational cost, the utilized goodness of fit measures often satisfy Bellman’s Principle 
of Optimality (Bellman, 1952), and can thus be optimized through the use of dynamic programming. However, 
in most cases the ability to obtain this optimal solution comes with a computational cost. Usually this results 
in at least 0(T 2 ) computational complexity. Examples of exact algorithms include the Kernel Change Point 
algorithm, (Harchaoui and Cappe, 2007) and (Arlot et al., 2012), and the MultiRank algorithm (Lung-Yut-Fong 
et al., 2011). The complexity of these algorithms also depends upon the number of identified change points. 
However, a method introduced by Jackson ct al. (2005), as well as the PELT algorithm of Killick et al. (2012) can 
both obtain optimal segmentations with running times that are independent of the number of change points. An 
additional benefit of the PELT approach is that under certain conditions it is shown to have an expected running 
time that is linear in the length of the time series. 

The second aspect of multiple change point analysis is the determination of the number of change points. 
The first technique that is commonly used by approximate search algorithms is hypothesis testing. This method 
continues to identify change points until they are unable to reject the null hypothesis of no change. Such an 
approach however, is not well suited for many procedures that use exact search algorithms, since many identify 
all change points at once, instead of sequentially, as is the case with binary segmentation. Many change point 
algorithms that use an exact search procedure instead turn to penalized optimization. The reasoning behind 
penalization is that a more complex model, in this case one with more change points, will better fit the observed 
data. The the penalty thus helps to guard against over segmentation. Yao (1987) showed that using the Schwarz 
Criterion can produce a consistent estimate of the number of change points. It has since become popular to 
maximize a penalized goodness of fit measure of the following form, 

k 

S(k,P)= max V C(Ci)+V(k), (1) 

T 1 <T 2 <-’-<T k 

1=1 

for a penalty function 'P(-) and measure of segmentation quality C(-). A common choice for the penalty function 
is V{k) = —/3k, for some user defined positive constant /3. This type of penalization only takes into consideration 
the number of change points, and not their location. There are other penalization approaches that not only 
consider the number of change points, but also the change point locations. See for instance Zhang and Siegmund 
(2007) and Hannart and Naveau (2012). 

An alternative to penalization is to instead generate all optimal segmentations with k change points, up to 
some prespecified upper limit. This corresponds to evaluating S(k, 0) from Equation 1 for a range of k values. 
However, depending on the choice of C(-) it may not be possible to efficiently calculate S(k, 0) for a range of k 
values. And thus the search procedure would have to be run numerous times, which can become rather inefficient. 
Penalization tends to be faster, but does require the specification of a penalty function or constant. This choice 
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is highly dependent upon the application field and will require some sort of knowledge about the data. Some 
ways to choose these parameters include cross validation (Arlot and Celisse, 2011), generalized degrees of freedom 
(Slien and Ye, 2002), and slope heuristics (Arlot and Massart, 2009). On the other hand, generating all optimal 
segmentations avoids having to make such a selection. 

In the following sections we introduce a change point search procedure which we call cp3o (Change Points via 
Probabilisticly Pruned Objectives). This is an exact search procedure that can be applied to a larger number 
of goodness of fit measures in order to reduce the amount of time taken to estimate change point locations. 
Additionally, the cp3o algorithm allows for the number of change points to be quickly determined without having 
to specify a penalty term, while at the same time generating all other optimal segmentations as a byproduct. 

As the cp3o procedure can be applied to a general goodness of fit measure we propose one that is based on 
E-Statistics Szekely and Rizzo (2005), which we call e-cp3o. The e-cp3o method is a nonparametric algorithm 
that has the ability to detect any type of distributional change. The use of E-Statistics also allows the e-cp3o 
algorithm to perform analysis on multivariate time series without suffering from the curse of dimensionality. 

The results from a variety of simulations show that our method makes a reasonable trade off between speed and 
accuracy in most cases. In addition to the computational benefits, we show that the cp3o procedure generates 
consistent estimates for both the number and location of change points when equipped with an appropriate 
goodness of fit measure. Furthermore, under additional assumptions we also show consistency in the setting 
where the number of change points is increasing with the length of time series. 

The remainder of this paper is organized as follows. In Section 2 we discuss the probabilistic pruning procedure 
used by cp3o, along with conditions necessary to ensure consistency. Section 3 is devoted to the development of 
the e-cp3o algorithm and showing that it satisfies the conditions outlined in Section 2. Results for applications 
to both simulated and real datasets are given in Sections 4 and 5. Concluding remarks are left for Section 6. 

2 Probabilistic Pruning 

When performing change point analysis one must have a quantifiable way of determining whether one segmentation 
is better than another. When using an exact search procedure this is most commonly accomplished through the 
use of a goodness of fit measure. Suppose that there are k change points 0 = To < n < ■ • • < Tfc < Tk+i = T. 
These k locations partition the time series into k + 1 segments Cj = {Z t : tj_i < i < tj}. The challenge now is 
to select the change point locations so that the observations within each segment are identically distributed, and 
the distribution of observations in adjacent segments are different. Therefore, we will consider sample goodness 
of fit measures of the following form, 


k 

G(k)= max J2^Cj,C j+1 ), (2) 

Tl<T2<---<Tfc z -' 

J=1 

in which R(A, B) is a measure of the sample divergence between observation sets A and B. The divergence 
measure R is such that larger values indicate that the distributions of the observations in the two sets are more 
distinct. Since each term of the sum in Equation 2 depends only upon contiguous observations it is possible to 
obtain the value of G{k) through dynamic programming. 

Using traditional dynamic programming approaches greatly reduces the computational time required to per¬ 
form the optimization in Equation 2. However, the running time of such methods is still quadratic in the length 
of the time series, thus limiting their applicability. Many of the calculations performed during the dynamic pro¬ 
grams do not result in the identification of a new change point. These calculations can be viewed as excessive 
since they do not provide any additional information about the series’ segmentation, and quickly compound to 
slow down the algorithm. Thus a practical step towards reducing running time, and even possibly the theoretical 
computational complexity, is to quickly identify such excessive calculations and have them removed. One way to 
do this is by continually pruning the set of potential change point locations. Rigaill (2010) proposes a pruning 
method that can be used when the goodness of fit measure is convex, and can also be adapted for online change 
point detection. Since the sample divergence measure R is not necessarily convex this pruning approach may 
not always be applicable. The cp3o procedure therefore performs pruning that is more in line with the approach 
taken by Killick et al. (2012) in developing the PELT method. 

Let 0 < v < t < s < u <T and Z h a = {Z a i Z a ^-i , . • • 5 for cl ^ b. Fu.ithGrmor6 5 suppose that there exists a 
constant T such that 

R(Zt+i, Z't+i) - R(zl +1 ,z s t+1 ) - R(Z° +1 , z: +1 ) < r 
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holds for all v < t < s < u. The value of T will depend not only on the distribution of our observations, but also 
the nature of the divergence measure R. Therefore, in many settings it may be difficult, if not impossible, to find 
such a T. Instead we consider the following probabilistic formulation. Let e > 0, we then wish to find T e such 
that for all v < t < s < u 

v(r(zZ +1 , Z? +1 ) - R(zt +1 ,z; +1 ) - R(z? +1 ,z: +1 ) > r e ) < e. 

Let Ck(t) denote the value of G(k) when segmenting Z\, Z %,..., Z t with k change points. Using this notation we 
can express our probabilistic pruning rule as follows. 

Lemma 1. Let v be the optimal change point location preceding t, with t < s < u, < T. If 

Ut) + R{ZUi,Z s t+l ) + T t <f k {s), 

then with probability at least 1 — e, t is not the optimal change point location preceding u. 

Proof. If 

Cfc(t) + R{Zl +1 , Zf +1 ) + R(Zf +1 ,Z™ +1 ) + T e < ( k (s) + R(Zf +1 ,Z: +1 ), 

then from the definition of T e we have that 

p(a(t) + R(ZZ +1 ,Zf + 1 ) < a(s) + R(Z? +1 ,Z™ + 1 )) > 1 - e. 

Since the optimal value attained from segmenting Z\, Z 2 ,..., Z u with k + 1 change points is an upper bound for 

Ck(t) + R(z; +1 ,z? +1 ), 

p(a(t) + R(Zl +1 ,Zf +1 ) < Ck +1 («)) > 1 - e. 

From this we can see that with probability at least 1 — e, it would be better to have s as the change point prior 
to u. E§ 


2.1 Consistency 

As has been mentioned before, when performing multiple change point analysis it is of utmost importance to 
obtain an accurate estimate of the number of change points, as well as their locations. Therefore, in this section 
we will show that under a certain asymptotic setting, the estimates generated by maximizing Equation 2 generate 
consistent location estimates. 

When showing consistency many authors consider the case in which the number of change points is held 
constant, while the number of observations tends toward infinity. This seems rather unrealistic, as one would 
expect to observe additional change points as more data is collected. For this reason we will allow the number of 
change points, k(T), to possibly tend towards infinity as the length of the time series increases. The asymptotic 
setting we will consider is similar in nature to that taken by Venkatraman (1992) and Zou et al. (2014). 

In order to establish consistency of the proposed estimators we make the following assumptions. 

Assumption 1. Let LP = {F 0 , iq ,... } be a collection of distribution functions, and {Rje} a collection of doubly 
indexed positive finite constants. Suppose that A and B are disjoint sets of observations, such that the observations 
in A have distribution F a and those of B hare distribution F k , for F a ,Fb £ &. The constants Rjt are such 
that R a b = R(A,B) —> R a b almost surely as imn{ffA,ffB) —> 00 . Furthermore let f be a function such that 
| R a b — R a b I = o(f(#A A ffB)) almost surely for all pairs a and b. 

Assumption 2. Let At = min (r,- — r,._ 1 ), and suppose At —t 00 as T —> 00 . 

l<j<k(T) 

Assumption 2 states that the number of observations between change points tends towards infinity. This later 
allows us to apply the law of large numbers. 

Assumption 3. The number of change points k{T) and its upper bound K(T ) are such that k(T ) = o(^j 
and k(T) = o(K(T)). 
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The above assumption controls the rate at which the number of change points can increase. This is directly 
related to the rate at which our sample estimates converge almost surely to their population counterparts. 

Assumption 4. Let & be the collection of distribution functions from Assumption 1. From this collection we 
define a set of random variables {X(a,b)}^ b _ 0 . For each pair of values 0 < a < b the random variable X(a,b) 
has a mixture distribution created with mixture components F a ,F a+ i,... ,Ff,. 

Then for r < q, and integers 0 = sq < S\ < S 2 < ■ ■ ■ < s r < s r +i = q + 1, define, 

r 

G q (r)= max ^' R(X(sj-i, Sj - 1), X(sj, s i+1 - 1)); 

Si,S2,...,S r z ' 

2 — 1 

and for r > q we define G q (r) to be equal to G q (q). Assume that dr = max G k ^ T \i) — G k ^ T \j ), is such that 

l<i,j<k(T) 

[d T k(T)\/[K(T)] -> 0 as T-> oo. 

Assumption 4 concerns the rate at which additional change points increase the objective function of interest. 
We will show that a higher upper bound on the number of change points is necessary when each additional change 
point has the potential to greatly change the value of G q (r). 

Assumption 5. Let 0 < 7 T< 7 <P <1 and *i < i 2 < *3 < *4 positive integers. Suppose that the 
time series Z\,Z 2 ,... ,Zt is such that Zy vT t +1 ,..., Zy^ T , ~ X(ii,i 2 ) and Zy lT ^ +1 ,..., Zy pT y ~ X(i 3 ,ii) for 
every sample of size T. For j £ (77 p) we define the following sets A( 7 ) = , Z\~t\} and B( 7 ) = 

{Zy^T^+h ■ ■ ■ 1 Zy p T \} ■ Assume that there exist a class of functions indexed by 71 , 7 , and p; ©£( 717 ) : ( 7 r,p) 1 —> R, 
such that R{A(0^(jjj)R(X(ii, i 2 ), X(i 3 , * 4 )) almost surely as T —► 00 . Finally we assume that 
©^Cblb) has a unique maximizer at 7 = 7 . 

Assumption 5 describes the behavior of our goodness of fit measure when it is used to identify a single change 
point. Essentially this assumption states that the measure will attain its maximum value when the estimated 
change point location, 7 , and true change point location, 7 , coincide. 

Change Point Locations 

We begin by showing that under Assumptions 1-5, the cp3o procedure will produce consistent estimates for the 
change point locations. 

Lemma 2. Let Q(k{T)) = { 77 , t 2 , ..., fk(T)} an d 


B T (e) = B T {e,{Ti}) = |(r?i, 772 ,.. • , Vk(T)) £ ® fc(T) : ^ T ^ < e for i = 1,2,.. .,k(T) j . 

Then for all e > 0, 

P(G(k(T))£B T (e))^l 

as T —>• 00 . 

Proof. Suppose that Q{k(T)) ^ £>t(e), then there exists i such that > e. Select the largest such i and 

define the following random variables. Let M\ ~ U\ where U\ is the distribution (possibly a mixture) created by 
the observations between t;_ 2 and fj_i, M 2 ~ U 2 for U 2 having distribution created by the observations between 
fj_ 1 and r t . Similarly define M 3 for the observations between and and M 4 for the observations between 
Ti+i and f i+2 . 

Then the value of the sample goodness of fit measure generated by the estimates of Q(k(T)) is 

R(M u M 2 ) + R{M 2 , M 3 ) + R(M 3 , Mi) + A, 

which due to Assumptions 1 and 3 is equal to 

©o(/3i|7i)-R(^1i U 2 ) + ®o{P 2 \l 2 )R{U 2 , U 3 ) + ® 0 (^ 3 1 73 )^(^3) ^ 4 ) + B + k(T)o(f(T)). 

In the above expressions A and B are collections of terms that are not affected by the choice of f». The fit and 
7 i terms are as listed below. 
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fh = 


Ti-l—Ti-2 




7i = 


Ti-l—Ti-2 


Ti—Ti — l 

72 r i +1 — Ti-i 


Ti+l—Ti 
f; + 2 — Ti 


Each of the terms in the sum is maximized when /?, =7*, which corresponds to 


\Tj-Tj\ 

T 


0. By our assumptions, 


we have that the remainder term fc(T)o(/(T)) = o(l). Therefore, if is strictly bounded away from 0 then 


the statistic will be strictly less than the optimal value as T 
Ti is selected. 


oo. However, this contradicts the manner in which 

□ 


Number of Change Points 

Once we have shown that the procedure generates consistent estimate for the change point locations it is simple 
to show that it will also produce a consistent estimate for the number of change points. We have chosen to 
implement the procedure outlined below in Assumption 6 to determine the number of change points. However, 
other approaches could be used and still have the same consistency result. 

Assumption 6. Define VG(fc) = G(k + 1) - G{k), £(VG) = 5 ( ^gjlf (1) and ct 2 (VG) = VAr{VG{k) : k = 
1, 2,..., K(T) — 1}. Then suppose our estimated number of change points is given by 


(3) 


k(T) = 1 + max \l : VG(1),..., VG(f) > £(VG) + - \/d 2 (VG) 


The selection procedure in Equation 3 has similar intuition to the one presented by Lavielle (2005). Both 
procedures work on the principle that a true change point will cause a significant change in the goodness of fit. 
While spurious change point estimates will only cause a minuscule increases/decrease in value. We thus say that 
a change is significant if it is more than half a standard deviation above the average change. As previously stated, 
other methods could be used, this just happens to be the one that we chose to implement. 

Before proving that we can obtain a consistent estimate for the number of change points one final assumption 
is made to ensure that the detection of an additional true change point causes a strictly positive increase in our 
goodness of fit measure. A similar property is also needed for the finite sample approximation. 

Assumption 7. For every fixed q, suppose the values G 9 (l), G 9 (2),..., G q (q) form an increasing sequence. 
Similarly let G q (r ) be the finite sample estimates of G q [r). Additionally, suppose there exists Tq such that for 
T > To the values G fe l T )(l), G fe ^ r ^(2),..., G k l T l(k(T)) also form an increasing sequence. 

Lemma 3. Let k(T) be the number of estimated change points for a sample of size T, and that the conditions of 
Assumptions 1-1 hold. Then 

lim P (k{T) = fc(T)) = 1. 

Proof. If k(T) is bounded then the proof for the constant k(T) version applies. Suppose that k[T) > fc(T), then 
VG k T)(k(T)) > ^(VG) + \\J(J 2 {S7G). Letting = \ [G k{ - T \i) - G k(T )(j)} 2 , we note the following inequalities: 


M(VG) = 


G k T)(K(T)) - G k{ T)( 1) 


< 


dp 


K(T) - 1 
->• 0 . 


K(T) - 1 
+ o(l) 
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<7 2 (VG) = 


y - [ 

{K(T) — 1)(K(T) — 2) ^ 2 L 


G k( - T \j)-G HT) {i) 1 +o(l) 


< 


^A'(T) - 1 

( T )" 1 

0 . 


E v « 

l<i<j<fc(T) 


E 

l<i<k(T)<j<K(T)-l 


E v 5 

k(T)<i<j<K(T)-l 


(^y T+km( K m - H T ))dT+ ( K ^-^ H d 


+ o(l) 

o(l) 


since G fe ( T )(f) = G fe ( T )(fc(T)) for i > Jfc(T). Thus P(jfe(T) > jfc(T)) -> 0 as T oo. 

Next suppose that fc(T) < fc(T). This implies that VG fe ^ r ^(fc(T)) < p(VG) + \\Ja 2 {\/G). However, since 
VG fc ( T )(*) > 0 for i < k(T ), P(fc(T) < k{T)) -S> 0 as T -> oo. □ 

Theorem 4. For all e > 0, as T —» oo, 

p(fc(T) = k(T),g{k{T)) € B r (e)) 1 - 
Proof. Using the results of Lemma 2 and Lemma 3 we have the following; 


{k(T)=k(T),g(k(T))GB T (e) 


> 1 - 

->■ 1 . 


P (fe(T) ^ fc(T)) - P (^(/c(T)) i B T (e)) 


□ 


3 Pruning and Energy Statistics 

The cp3o procedure introduced in Section 2 can be applied with almost any goodness of fit measure 
However, in order to ensure consistency for both the estimated change point locations, as well as the estimated 
number of change points, some restrictions must be enforced as outlined in Section 2.1. 

In this section we make use of a particular class of goodness of fit measures that allows for nonparametric 
change point analysis. These measures are indexed by a € (O^) 1 and allow for the detection of any type of 
distributional change. When a value of a is selected, the only distributional assumptions that are made are that 
observations are independent and that they all have finite absolute ath absolute moments. This class of measures 
are based upon the energy statistic of Szekely and Rizzo (2005), and we thus call the resulting procedure e-cp3o. 

The e-cp3o procedure is a nonparametric procedure that makes use of an approximate test statistic and an 
exact search algorithm in order to locate change points. Computationally the e-cp3o procedure is comparable 
to other parametric/nonparametric change point methodologies that use approximate search algorithms. In the 
remainder of this section we give a brief review of E-Statistics, followed by their incorporation into the cp3o 
framework. Finally, we show that the resulting goodness of fit measure satisfies the conditions necessary for 
consistency. 

3.1 The Energy Statistic 

As change point analysis is directly related to the detection of differences in distribution we consider the U- 
statistic introduced in Szekely and Rizzo (2005). This statistic provides a simple way to determine whether the 
independent observations in two sets are identically distributed. 

Suppose that we are given samples X n = {Xi : i = l,...,n} and Y m = {Yj : j = l,...,m}, that are 
independent iid samples from distributions Fx and Fy respectively. Our goal is to determine if Fx = Fy. We 
then define the following metric on the space of characteristic functions, 

D{X,Y\oi)= [ \(f x (t) - cj)y(t)\ 2 uj{t\a)dt, 

JR d 

1 The choice of a = 2 is allowed, however, in this case the goodness of fit measure would only be able to detect changes in mean. 













i which (f> x and <f> y are the characteristic functions associated with distributions Fx and Fy respectively. Also 
uj(t\a) is a positive weight function chosen such that the integral is finite. By the uniqueness of characteristic 
functions, it is obvious that V(X,Y\a) = 0 if and only if Fx = Fy. 

Another metric that can be considered is based on Euclidean distances. Let (X ', Y') be an iid copy of (A, Y), 
then for a £ (0, 2) define 


£(X,Y |a) = 2E\X- Y\ a - E\X - X'\ a - E\Y - Y'\ a . 


(4) 


For an appropriately chosen weight function, 


ui(t | a) 


/ 27T <i / 2 r(l — a/2) 
\a2 a T{{d + a)/2) 


\t\ 


d-\-OL 


-1 


we have the following lemma. 

Lemma 5. For any pair of independent random variables X and Y, and a £ (0,2) is such that E(\X\ a + | Y|“) < 
oo, then £(X,Y\a) = T>(X,Y\a) and £(X,Y\a) £ [0, oo). Moreover, £(X,Y\a) = 0 if and only if X and Y are 
identically distributed. 


Proof. See the appendix of Szckely and Rizzo (2005). 


□ 


Theorem 5 allows for an intuitively simple empirical divergence measure. Let X n and Y m be as above, then 
we can define the empirical counterpart to Equation 4 


£(X 


m Y m , t 


) = Ar£D*-^r- 


i =1 3 =1 


-1 


E \ x i- x i\ a - 

l<i<j<n 


-1 


E 


Yi-Yj | 


(5) 


3.2 Incomplete Energy Statistic 

The computation of the U-statistics presented in Equation 5 require 0(n 2 V m 2 ) calculations, which makes it 
impractical for large n or to. We propose working with an approximate statistic that is obtained by using 
incomplete U-statistics. In the following formulation of the incomplete U-statistic let <5 £ {2,3,..., [v^T]}. 

Suppose that we divide a segment of our time series into two adjacent subseries, X n = {Z ai Z a+ i ,..., Z a+n _i} 
and Y m = { Z a+n , Z a+n+1 ,..., Z a+n+m _ i}, and define the following sets 


Wx 

B 5 


n—6—1 


{(i,j):a + n — 5<i<j<a + n} U {(a + z, a + i + 1)} 

i =o 

m —2 

{(*, j) : a + n < i < j < a + n + <5} U [J {(a + n + i, a + n + i + 1)} 


i—5 — 1 


({a + n — 1,..., a + n — <5} x {a + n ,..., a + n + S — 1}) U I |^J {(a + n — i, a + n + i — 1)} 

\i=< 5 +l / 


The set B s aims at reducing the number of samples needed to compute the between sample distances. While the 
sets Wx and Wy reduce the number of terms used for the within sample distances. When making this reduction 
the sets IU|-, Wy, and B s consider all unique pairs within a 6 window around the split that creates X n and Y rn . 
This point corresponds to a potential change point location and thus we use as much information about points 
close by to determine the empirical divergence. 

We then define the incomplete U-statistic £ as 


£(X n ,Y m \a,6) 


#B5 


E \ x i~Yj\ 




1 


E 

(bi)ew£ 


1 

#W? 


E 


( 6 ) 


Using this approximation greatly reduces our computational complexity from 0(n 2 V to 2 ) to 0(n V to). Nasari 
(2012) shows that a strong law of large numbers result holds for incomplete U-Statistics, and and thus £ and £ 
have the same almost sure limit as n A to —> oo. 
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3.3 The e-cp3o Algorithm 

We now present the goodness of fit measure that is used by the e-cp3o change point procedure. In addition, we 
show that the prescribed measure satisfies the necessary consistency requirements from Section 2.1. 

The e-cp3o algorithm uses an approximate test statistics combined with an exact search algorithm in order 
to identify change points. Its goodness of fit measure is given by the following weighted U-Statistic, 

zn 777 77 

K(X n , Y m \a ) = t- —^£(X n ,Y m \a). (7) 

(m + n) z 

Or an approximation can be obtained by using its incomplete counterpart 

777 77 ~ 

K(X n ,Y m \a,6) = -— ^£(X n ,Y m \a,d). 

[m + n) z 

By using Slutsky’s theorem and a result of Szekely and Rizzo (2010) we have that if F x = Fy then 1Z(X„ , Y m \a ) -A 
0, and otherwise lZ(X n ,Y m \a) tends almost surely to a finite positive constant, provided that m = 0(n) (this 
means that n = 0(m) and m = 0(n )). In fact if F x = Fy we have that lZ(X n , Y m \a, 6) —>■ 0 almost surely. 

In the case of 7Z(X n ,Y m \a), the result of (O’Neil and Redner, 1993, Theorem 4.1) combined and Slutsky’s 
theorem show that under equal distributions, F x = Fy , TZ(X n ,Y m \a,5) A 0. Similarly, 7Z(X n ,Y m \a, 5) also 
tends towards a positive finite constant provided m = 0(n). These properties lead to a very intuitive goodness 
of fit measure, 

k 

G(k\a) = max TZ(C^Cj + i\a). 

Tl<T2<"<T k 

j=i 

By using the dynamic programming approach presented by Lavielle and Teyssiere (2006), the values G(£\a) for 
i < k can be computed in Q(kT 3 ) instead of 0{T k+2 ) operations. However, the T 3 term makes this an inadequate 
approach, so the procedure (e-cp3o) is implemented with the similarly defined goodness of fit measure G(k\a , S ), 
which allows for only (D(kT 2 ) operations. 

Consistency of e-cp3o 

We now show that the goodness of fit measure, R(-,-\a,S), used by e-cp3o satisfies the conditions for a cp3o 
based procedure to generate consistent estimates. It is assumed that a has been chosen so that all of the crth 
moments are finite. In the results below we will consider the goodness of fit measure R based upon the complete 
U-Statistic, even though the e-cp3o procedure is based on its incomplete version R. The reason for this is that R 
and R have the same almost sure limits, and we are working in an asymptotic setting. 

Proposition 6. Assumption 5 is satisfied by the e-cp3o goodness of fit measure. 

Proof. Using the result of (Matteson and James, 2014, Theorem 1) we have that 

R(A{j),B(j)\a) ->■ 7(1 - i)h{,T,'y)^(.U 1 ,U 2 \a). 

Such that Z7i ~ X(ii,i 2 ) U 2 ~ X(i 3 ,u), and hfxyy) = (fl x > y + yrfl x <y^ ■ Therefore, R(X X{i 3 ,i A )) = 

7(1 — 7 )£{U \, U 2 \a) and 0o(7l7) = ft( 7 ; 7 ), which can be shown to have a unique maximizer at 7 = 7 . □ 

Proposition 7. The portion of Assumption 7 about {G m (r)}(!Z 1 holds for the e-cp3o goodness of fit measure. 

Proof. We begin by showing that G m (l) < G m ( 2). Suppose the first change point partitions the time series into 
two segments, one where observations are distributed according to F and another where they are distributed 
according to J. Now suppose that J is a created by a linear mixture of the distributions G and F[ (which may 
themselves be mixture distributions). Suppose that the second change point is positioned so as to separate these 
distributions, G and FI. Let random variables X , Y. and Z be such that X ~ F,Y ~ G, and Z ~ H. Then we 
have that 

G m (l) = [ \(f x (t) - (fy(l3t)(t) z {{l - /3)t)\ 2 u;(t\a)dt 

JWL d 
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where /3 is the mixture coefficient used to create the distribution J. It is clear that this will be maximized either 
when (3 = 0 or p = 1, in either case we will show that the obtained value is bounded above by 


G m (2)= [ \<j) x {t) - (j) y (t)\ 2 uj{t\a) dt + [ \(/)y(t) - (f z {f)\ 2 u{t\a) dt. 

J R d Js. d 


Case /3 = 1: In this setting the value of G m (l) is equal to the first term in the definition of G m ( 2), and since the 
distributions G and H are distinct, the second term is strictly positive. Thus G m (l) < G m (2). 

Case p = 0: In this case G m (l) = J \<j> x (t) — (f z (t)\ 2 cj(t\a)dt. However, since we have a metric, the triangle 
inequality immediately shows that G m ( 1) < G m (2). 

In the above setting the location of the first change point was held fixed when the second was identified. 
This need not be the optimal way to partition the time series into three segments. Thus since this potentially 
suboptimal segmentation results in an upper bound for G m (l) it follows that the optimal segmentation will also 
bound G m (l). 

The argument to show that G m (r) < G m (r + 1) for r = 2,..., to — 1 is identical. □ 


Proposition 8. 


The portion of Assumption 7 about 


|G fc ( T )(r)} 


k(T) 
r =1 


holds for the e-cp3o goodness of fit measure. 


Proof In the paper Szekely and Rizzo (2005), the empirical measure used for the statistic £ is based upon V- 
statistics, while we instead use U-statistics. The use of V-statistics ensures that the statistic will always have a 
nonnegative value. This isn’t the case when using U-statistics, but the difference in their value can be bounded 
by a constant multiple of ^. Combining this with the fact that 0 < G k ^ T \r) < G klyT \r + 1), and —> 0, we 
conclude that for T large enough the version of the statistics based on U-statistics will also produce nonnegative 
values. Therefore for T large enough G k ( T \r) < G fc ( T )( r + 1). □ 


4 Simulation Study 


We now show the effectiveness of our methodology by considering a number of simulation studies. The goal 
of these studies is to demonstrate that the e-cp3o procedure is able to perform reasonably well in a variety of 
settings. In these studies we examine both the number of estimated change points as well as their estimated 
locations. 

To assess the performance of the segmentation obtained from the e-cp3o procedure we use Fowlkes and Mallows’ 
adjusted Rand index (Fowlkes and Mallows, 1983). This value is calculated by comparing a segmentation based 
upon estimated change point locations to the known true segmentation. The index takes into account both the 
number of change points as well as their locations, and lies in the interval [0,1], where it is equal to 1 if and only 
if the two segmentations are identical. 

For each simulation study we apply various methods to 100 randomly generated time series. We then report 
the average running time in seconds, the average adjusted Rand value, and the average number of estimated 
change points. 

As the simulations in the following sections will demonstrate, the e-cp3o procedure does not always generate 
the best running time or average Rand values. However, in every setting it generates results that are either better 
or comparable to almost all other competitors, when accuracy and speed are viewed together. For this reason we 
would advocate the use of the e-cp3o procedure as a general purpose change point algorithm, especially for small 
to moderate length time series. 

To perform the probabilistic pruning introduced in Section 2 the value of r e must be specified. In our 
implementation we obtain an estimate of r e in the following way. We uniformly draw d(-) random samples from 
the set 

{(v,t,s,u) : v < t < s < u and min{i — v, s — t, u — s} > <5} . 

For each sample we calculate 


R(Zl +1 , Zf +1 -a) - R(Zl +1 , Z s t+1 -a) - R(Zf +1 , Z“ +1 ;a), 


and then set T e equal to the 1 — e quantile of these quantities. Any other sampling approach could be used to 
obtain a value for F e as long it satisfies the probabilistic criterion. 
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PELT E-Div e-cp3o 


T=4QQ, k(T)=3, K(T)=9 


Rand 

0.884 7xlo -3 

0.987 2x io-3 

0.937 10 -2 

# of cps 

9.150o.4 

3.070 3x i O -2 

2.660 8x1O -2 

Tirne(s) 

0.003 5xlo -5 

9.199,5x10-2 

0.150 7x1O -4 


T=1650, k(T)=10, K(T) 

=50 

Rand 

0.953 3xlo -3 

0.992 7x1Q -4 

0.940 5x1Q -3 

# of cps 

16.690o.4 

10.050 2x1Q -2 

9.390 7x1O -2 

Time(s) 

0.009 7xlo -5 

239.263 0 .r 

3-542 3x10 -2 


Table 1: Results of the first univariate simulation from Section 4.1 with different time series lengths. The true number of change 
points is given by k(T) and K(T) the upper limit used by e-cp3o. The table contains average values over 100 replicates, with standard 
error as subscripts. 


4.1 Univariate Simulations 

We begin our simulation study by comparing the e-cp3o procedure to the E-Divisive and PELT procedures. 
These two procedures are implemented in the ecp (James and Matteson, 2014) and changepoint R packages 
respectively. This set of simulations consist of independent Gaussian observations which undergo changes in their 
mean and variance. The distribution parameters were chosen so that ji 3 U{— 10,10) and <r| U(0, 5). For 
each analyzed time series all of the different change point procedures were run with their default parameter values. 
For E-Divisive and e-cp3o this corresponds to a = 1. And for e-cp3o the minimum segment size was set to 30 
observations (corresponding to S = 29), and a value of e = 0.01 is used for the probabilistic pruning. Since in this 
simulation study the number of change points increased with the time series length, the value of K(T ) would also 
change. The results of these simulations are in Table 1, which also includes additional information about the time 
series and upper limit K(T). As can be seen from Table 1, better results are obtained by combining an exact test 
statistic with an approximate search algorithm. But these gains in segmentation quality are rather small. Thus, 
because of the increase in speed and small loss in segmentation quality, we would argue that the e-cp3o procedure 
should be preferred over the E-Divisive. The PELT procedure was much faster, but the e-cp3o procedure was 
able to generate segmentations that were similar in quality as measured by the adjusted Rand index. 

The next set of simulations also compares to a nonparametric procedure from the npcp R package. This 
procedure, like the e-cp3o, is designed to detect changes in the joint distribution of multivariate time series. More 
information about this procedure, which we will denote by NPCP-F, is given in Section 4.2. Time series in this 
simulation study contain two changes in mean followed by a change in tail index. The changes in mean correspond 
to the data transitioning from a standard normal distribution to a N( 3,1) and then back to standard normal. 
The tail index change is caused by a transition to a t-distribution with 2.01 degrees of freedom. We expect that 
all three methods will be able to easily detect the mean changes and will have a more difficult time detecting the 
change in tail index. As with the previous set of simulations, all procedures are run with their default parameter 
values. Results for this set of simulations can be found in Table 2. Surprisingly, in this set of simulations the 
e-cp3o procedure was not only significantly faster than the E-Divisive and NPCP-F, but also managed to generate 
slightly better segmentations on average. 


12 









NPCP-F 

E-Div 

e-cp3o 


T= 

=400, k(T)=3, K(T)= 

=9 

Rand 

0.820.5 X io-3 

0.8285x10-3 

0.874 7x io-3 

# of cps 

2.280 6x1O -2 

2 . 200 5x io -2 

2.430 5x io-2 

Time 

4.790 2x io- 2 

6.726 5x io-2 

0.176io-3 


T= 

1600, k(T)=3, K(T) 

=9 

Rand 

0.8396x10-3 

0.864 8x io-3 

0.9175xio-3 

# of cps 

2.370 6x io-2 

2.48O7X10-2 

2.920 3x io-2 

Time 

71.772o.3 

143.207i.i 

2.392 4x io-2 


Table 2: Simulation results for time series with mean and 
tail index changes. The subscripts indicate the standard 
errors for each value. 



Figure 1: Example of time series with changes in mean 
and tail index. Mean changes occur at times 100 and 200, 
while the tail index change is at time 300. 


These two simulation studies on univariate time series show that the e-cp3o procedure performs well when 
compared to other parametric and nonparametric change point algorithms. The first set of simulations showed 
that it generated segmentations whose quality is comparable to that of an efficient parametric procedure when its 
parametric assumptions were satisfied. While the second set of simulations showed that it is able to handle more 
subtle distributional changes, such as a change in tail behavior. The flexibility of the e-cp3o method allows for it 
to be used when parametric assumptions are met, as well as in settings where they aren’t sure to be satisfied. 


4.2 Multivariate Simulations 

We now examine the performance of the e-cp3o procedure when applied to a multivariate time series. Since a 
change in mean can be seen as a change in a marginal distribution we could just apply any univariate method to 
each dimension of the dataset. For this reason we will examine a more complex type of distributional change. In 
this simulation the distributional change will be due to a change in the copula function (Sklar, 1959), while the 
marginal distributions remain unchanged. Since the PELT procedure as implemented in the changepoint package 
only performs marginal analysis it is not suited for this setting, and will thus not be part of our comparison. We 
instead consider a method proposed by Gombay and Horvath (1999) and implemented in the R package npcp by 
Holmes et al. (2013). This package provides two methods that can be used in this setting. One that looks for any 
change in the joint distribution (NPCP-F) and one designed to detect changes in the copula function (NPCP-C). 

For a given set of marginal distributions, the copula function is used to model their dependence. Thus a 
change in the copula function reflects a change in the dependence structure. This is of particular interest in 
finance where portfolios of dependent securities are typical (Guegan and Zhang, 2010). 

In this simulation we consider a two dimensional process where both marginal distributions are standard 
normal. While the marginal distributions remain static, the copula function evolves over time. For this simulation 
the copula undergoes two changes. Initially it is a Clayton copula and then changes to the independence copula 
and finally becomes a Gumbel copula. The density function for each of the used copulas is provided in Table 3 
and simulation results in Table 4. 

Copula Density c(u, v) 

Clayton (max{it -2 - 8 + v~ 2 ' 8 — 1, 0}) 5 ^ 14 

Independence uv 

Gumbel exp 


(-log(u)) 2 ' 8 + (-log(u)) 


2.8 


1 5/14 


Table 3: The densities for the copula functions used in the multivariate simulations. 
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(a) Clayton (b) Independence (c) Gumbel 

Figure 2: Contour plots of distributions used to generate data in the multivariate simulation. 



NPCP-C 

NPCP-F e-cp3o 


T=300, k(T)=2, K(T)=9 

Rand 

0.9584x10-3 

0.616 9x io-3 0.685 8x io-3 

# of cps 

2.1304x10-2 

0.320 6x io- 2 4.000o.i 

Time 

73.163 0 .i 

1.871 6x io- 2 0.1254xio-3 


T=1200, k(T)=2, K(T)=9 

Rand 

0 .979 3x io-3 

0.865i O -2 0.766iq-2 

# of cps 

2.1504x10-2 

1.790 9x io-2 1-5707xio-2 

Time 

10,580.831 4 .o 

41.270 o .7 1.901 Sx io-3 


Table 4: Results of the multivariate simulation with different time series lengths. The subscripts indicate the standard errors for 
each value. 


As was expected, in Table 4 it is clear that the NPCP-C method obtained the best average Rand value in 
all situations. But this comes at a much increased average running time. This becomes very problematic when 
analysis of a single longer time series can take almost three hours. For shorter time series the e-cp3o provides the 
best combination between running time, estimated number of change points, and Rand value. For longer time 
series the NPCP-F procedure is the clear winner. 


5 Real Data 

In this section we apply the e-cp3o procedure to two real data sets. For our first application we make use of a 
dataset of monthly temperature anomalies. The second second consists of monthly foreign exchange (FX) rates 
between the United States, Russia, Brazil, and Switzerland. 

5.1 Temperature Anomalies 

For the first application of the e-cp3o procedure we examine the HadCRUT4 dataset of Morice et al. (2012). 
This dataset consists of monthly global temperature anomalies from 1850 to 2014. Since the dataset consists of 
anomalies, it does not indicate actual average monthly temperatures, but instead measured deviations from some 
predefined baseline. The time period used to create the baseline in this case spans 1960 to 1990. 

The HadCRUT4 dataset contains two major components; one for land air temperature anomalies and another 
for sea-surface temperature anomalies. The analysis performed in this section will only consider the land air 
temperature anomaly component from the tropical region (30° South to 30° North). This region was chosen 
because it was the most likely of all the presented regions to have a small difference between the minimum and 
maximum anomaly value, and be affected by changing seasons. More information about the dataset and the 
averaging process used can be found in the paper by Morice et al. (2012). 

From looking at the plot of the tropical land air anomaly time series it is suspected that there is some 
dependence between observations. This assumption is quickly confirmed by looking at the auto-correlation plot. 
As a result, we apply the e-cp3o procedure to the differenced data which visually appears to be piecewise stationary. 
The auto-correlation plot for the differenced data shows that much of the linear dependence has been removed, 
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Figure 3: Change in land air temperature anomalies for the Tropical climate zone from February 1850 to December 2013. The cp3o 
estimated change point locations are indicated by dashed vertical lines. 


however, the same plot for the differences squared still indicates some dependence. As with the exchange rate 
data, we believe that this indicated dependence can be attributed to changes in distribution. 

The e-cp3o procedure was applied with a minimum segment length of one year, corresponding to 5 = 11; a 
maximum of K(T ) = 20 change points were fit, we chose a = 1, and e = 0.01. Upon completion we identified 
change points at the following dates: July 1860, February 1878, January 1918, and February 1973, which are 
shown in Figure 3. With these change points we notice that the auto-correlation plots, for both the differenced 
and squared differenced data, show almost no statistically significant correlations. This is in line with our original 
hypothesis that the previously observed correlation was due to the distributional changes within the data. 

Furthermore, the February 1973 change point occurs around the same time as the United Nations Conference 
on the Human Environment. This conference, which was held in June 1972, focused on human interactions with 
the environment. From this meeting came a few noteworthy agreed upon principles that have potential to impact 
land air temperatures: 

1. Pollution must not exceed the environment’s ability to clean itself 

2. Governments would plan their own appropriate pollution policies 

3. International organizations should help to improve the environment 

These measures, undoubtedly played a role in the decreased average anomaly size, as well as an almost 66% 
decrease in the variance. 

5.2 Exchange Rates 

We next apply the e-cp3o procedure to a set of spot FX rates obtained through the R package Quandl (McTaggart 
and Daroczi, 2013). For our analysis we consider the three dimensional time series consisting of monthly FX 
rates for Brazil (BRL), Russia (RUB), and Switzerland (CHF). All of the rates are against the United States 
(USD). The time horizon spanned by this time series is September 30, 1996 to February 28, 2014, which results 
in a total of 210 observations. Looking at the marginal series it is obvious that each of the individual FX rates 
does not generate a stationary process. Thus, instead of looking at the actual rate, we look at the change in the 
log process. This transformation results in marginal processes that appear to at least be piecewise stationary. 

Our procedure is only guaranteed to work with independent observations, so we must hope that our data either 
satisfies this condition or is very close to it. The papers by Hsich (1988, 1989) provide evidence that changes 
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(a) Brazil (b) Switzerland (c) Russia 

Figure 4: Time series for FX spot rates for each of the three countries. Estimated change point locations indicated by vertical lines. 


in the daily exchange rate are not independent, and that there is a reasonable amount of nonlinear dependence. 
However, they are not able to conclude whether this observed dependence is due to distributional changes or some 
other phenomena. For this reason we are instead interested in the change in the monthly exchange rate, which 
is more likely to either be weakly dependent or show no dependence. To check this we examine the auto/cross¬ 
correlation plots for both the difference and difference squared data. This preliminary analysis shows that there 
is no significant auto or cross-correlation within the differenced data, while for the squared differences there is 
only significant auto-correlation for Switzerland at a lag of one month. 

The e-cp3o procedure is applied with a minimum segment length of six observations (half a year), which 
corresponds to a value of S = 5. Furthermore, we have chosen to fit at most K(T ) = 15 change points, and values 
of a = 1 and e = 0.01 were used. This specific choice of values resulted in change points being identified at May 
31, 1998 and March 31, 2000. These results are depicted in Figure 4. 

It can be argued that changes in Russia’s economic standing leading up to the 1998 ruble crisis are the causes 
of the May 31, 1998 change point. During the Asian financial crisis many investors were losing faith in the Russian 
ruble. At one point, the yield on government bonds was as high as 47%. This paired with a 10% inflation rate 
would normally have been an investor’s dream come true. However, people were skeptical of the government’s 
ability to repay these bonds. Furthermore, at this time Russia was using a floating pegged rate for its currency, 
which resulted in the Central Bank’s mass expenditure of USD’s which further weakened the ruble’s position. 

The change point identified at March 31, 2000 also coincides with an economic shift in one of the examined 
countries. The country most likely to be the cause of this change is Brazil. In 1994 the Brazilian government 
pegged their currency to the USD. This helped to stabilize the county’s inflation rate; however, because of the 
Asian financial crisis and the ruble crisis many investors were averse to investing in Brazil. In January 1999 
the Brazilian Central Bank announced that they would be changing to a free float exchange regime, thus their 
currency was no longer pegged to the USD. This change devalued the currency and helped to slow the current 
economic downturn. The change in exchange regime and other factors led to a 48% debt to GDP ratio, besting 
the IMF target and thus increasing investor faith in Brazil. 


6 Conclusion 

We have presented an exact search algorithm that incorporates probabilistic pruning in order to reduce the 
amount of unnecessary calculations. This search method can be used with almost any goodness of fit measure 
in order to identify change points in multivariate time series. Asymptotic theory has also been provided showing 
that the cp3o algorithm can generate consistent estimates for both the number of change points as well as the 
change point locations as the time series increases, provided that a suitable goodness of fit measure is provided. 
Furthermore, the decoupling of the search procedure and the determination of the number of estimated change 
points allows for the cp3o algorithm to efficiently generate a collection of optimal segmentations, with differing 
numbers of change points. This is all accomplished without the user having to specify any sort of penalty constant 
or function. 

By combining the cp3o search algorithm with E-Statistics we developed e-cp3o, a method to perform nonpara- 
metric multiple change point analysis that can detect any type of distributional change. This method combines 
an approximate statistic with an exact search algorithm. The slight loss in accurately estimating change point 
locations on finite time series is greatly outweighed by the dramatic increase in speed, when compared to similar 
methods that combine an exact statistic with an approximate search algorithm. 
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